Handling of digital silence in audio fingerprinting

ABSTRACT

The invention relates to a method, a device, a client-server system as well as a computer program product and computer program element for handling digital silence when fingerprinting digital media signals. A fingerprint comprising a number of sub-fingerprints for at least a part of the digital media signal is generated, (step  42 ), and the influence of at least one piece of the media signal on the fingerprint is removed or changed, (step  48 ), which piece corresponds to digital silence. The invention in a reliable way avoids a wrong identification of media signals, such as audio signals, where digital silence is included. The invention is also easy to implement by only requiring some of the functionalities already provided in a computer.

TECHNICAL FIELD

The present invention generally relates to the field of fingerprintingof digital media signals, such as audio and more particularly to thegeneration of fingerprints when a part of the digital media signalincludes digital silence.

DESCRIPTION OF RELATED ART

It is known to provide fingerprints for media signals such as audiosignals in order to identify a certain piece of music. A local computerthen generates a fingerprint for an audio signal and sends thisfingerprint as a query to a database. In the database the fingerprint iscompared with other fingerprints and if a match is found, it is returnedto the local computer, which then has received an identification of theaudio signal.

Such fingerprinting is useful in many applications, for instance inradio stations for identifying play lists, but there is also a growingmarket for private persons wanting to buy music after having identifiedit, for instance on the radio.

One such fingerprinting scheme is described in “A Highly Robust AudioFingerprinting System”, by Jaap Haitsma and Ton Kalker, Ismir, October2002, where fingerprints are made up by a number of sub-fingerprints. Asub-fingerprint is based on a part of the media signal. 256 consecutivesub-fingerprints, which we will refer to as the fingerprint orfingerprint block, are computed during a short time interval in order toprovide a fast and safe identification of the media signal. Afingerprint can therefore be taken on for example the first threeseconds of a media signal. A positive identification is made in afingerprint database based if the Hamming distance between the derivedfingerprint and a fingerprint in the database is below a certainthreshold.

A problem of the known fingerprinting schemes that often the mediasignal can have parts that are made up of digital silence. An audio clipmight for instance start with silence, where for instance the PCM samplehas a value of zero, and a video clip can start with a number of blackframes. This means that sub-fingerprints made in the beginning duringthis digital silence, will be identical and reflect that no informationis present. Since a lot of different media signals or files can havethis digital silence in the beginning, it is possible that a query witha fingerprint made on the beginning would be found to wrongly correspondto several different stored media signals in the database.

SUMMARY OF THE INVENTION

It is thus an object of the present invention to provide fingerprintingwhere the effects of digital silence in a media signal are removed suchthat fingerprinting can be used with a diminished risk of identifyingthe wrong media signal.

According to a first aspect of the present invention, this object isachieved by a method of handling digital silence when fingerprinting adigital media signal comprising the steps of:

generating a fingerprint comprising a number of sub-fingerprints for atleast a part of the digital media signal, and

removing or changing the influence of at least one piece of the mediasignal on the fingerprint, which piece corresponds to digital silence.

According to a second aspect of the present invention, this object isalso achieved by a device for handling digital silence whenfingerprinting digital media signals and comprising:

a fingerprint generating unit arranged to generate a fingerprintcomprising a number of sub-fingerprints for at least parts of a digitalmedia signal, and

a digital silence removal unit arranged to remove or change theinfluence of at least one piece of the media signal on the fingerprint,which piece corresponds to digital silence.

According to a third aspect of the present invention, this object isfurthermore achieved by a system of devices for handling digital silencewhen fingerprinting digital media signals and comprising:

a server device having a database of fingerprints related to mediasignals stored as media files, and

a client device for generating fingerprint queries to the server device,wherein at least one of client and server device comprises:

a fingerprint generating unit arranged to generate a number ofsub-fingerprints for at least parts of a digital media signal, and

a silence removal unit arranged to remove or change the influence of atleast one piece of the media signal on the fingerprinting, which piececorresponds to digital silence.

According to a fourth aspect of the present invention, this object isalso achieved by a computer program product for handling digital silencewhen fingerprinting digital media signals, to be used on a computer,comprising a computer readable medium having thereon:

computer program code means, to make thy computer execute, when saidprogram is loaded in the computer:

generate a number of sub-fingerprints for at least parts of a digitalmedia signal, and

remove or change the influence of at least one piece of the media signalon the fingerprint, which piece corresponds to digital silence.

According to a fifth aspect of the present invention, this object isalso achieved by a computer program element for handling digital silencewhen fingerprinting digital media signals, to be used on a computer,said computer program element comprising: computer program code means,to make the computer execute, when said program is loaded in thecomputer:

generate a number of sub-fmgerprints for at least parts of a digitalmedia signal, and

remove or change the influence of at least one piece of the media signalon the fingerprint, which piece corresponds to digital silence.

Claims 2 and 3 are directed towards removing the cause for digitalsilence.

Claim 4 is directed towards adding random values to the whole mediasignal.

Claims 5 and 16 are directed towards providing random values forchanging the influence of digital silence.

Claims 6 and 17 are directed towards replacing sub-fingerprintsrepresenting digital silence with random values.

Claims 7 and 18 are directed towards replacing samples of the mediasignal representing digital silence with random values.

Claim 8 is directed towards providing different types of random numbergenerations in a client and a server device.

Claims 10 and 19 are directed towards processing the random number withtime and date information related to the generation of a fingerprint forlowering the probability of false identifications of media signals.

The present invention has the advantage of in a reliable way avoiding awrong identification of media signals in which digital silence isincluded. It is also easy to implement by only requiring some of thefunctionalities already provided in a computer. In a variation of theinvention it also guarantees that random numbers generated almostcertainly do not generate false identifications.

The general idea behind the invention is thus to remove digital silencerelated to media signals or to replace it with random values whengenerating fingerprints for the media signal.

The expression digital silence is intended to comprise digital audiosignals where the information in the signal represents no sound or soundbelow a certain low threshold where different valued sub-fingerprintsare not possible to generate as well as digital video information wherethe information in the frames represents black or is below a certainthreshold in which no images are discernible.

These and other aspects of the invention will be apparent from andelucidated with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be explained in more detail in relationto the enclosed drawings, where

FIG. 1 shows a block schematic of a device for generating fingerprintstogether with a database of fingerprints,

FIG. 2 schematically shows a client device connected to a server devicevia a network

FIG. 3 shows a block schematic of a device for handling digital silenceaccording to the invention,

FIG. 4 shows a flow chart of a method of handling digital silenceaccording to a first embodiment of the invention,

FIG. 5 shows a flow chart of a method of handling digital silenceaccording to a second embodiment of the invention,

FIG. 6 shows a block schematic of a first variation of a random numbergenerating unit in the device in FIG. 3,

FIG. 7 shows a second variation of a random number generating unit for adevice for handling digital silence according to the invention, and

FIG. 8 shows an optical disc on which program code for performing theinvention is stored.

DETAILED DESCRIPTION OF EMBODIMENTS

The present invention relates to the field of providing fingerprints fordigital media signals and will in the following be described in relationto fingerprinting of audio signals. It is however not limited to audiobut can be applied for other media signals like for instance video.

FIG. 1 shows a block schematic of a fingerprinting device 10 orfingerprint generating unit connected to a database 21 and arranged togenerate sub-fingerprints based on an audio signal. The fingerprintingdevice 10 in FIG. 1 is intended to be provided in a client device whichcan communicate with a server, which includes the database. A client cancontact this database in order to identify an audio signal via afingerprint. In order to generate a fingerprint, the fingerprintingdevice 10 receives an audio signal at a downsampler 11, whichdownsamples the audio signal. The downsampled audio signal is thenforwarded from the downsampler to a framing circuit 12, which dividesthe audio signal into (preferably overlapping) frames, which areweighted by a Hanning window. The thus framed audio signal is thenforwarded to a Fourier transform circuit 13, which computes spectralrepresentations of every frame. In a following block 14, absolute valuesof the Fourier coefficients are calculated. The device also includes aband division stage 15, which divides the frequency spectrum into anumber of bands and includes a number of selectors 151, which selectsthe Fourier coefficients of the respective band. To this band divisionstage 15 is connected an energy computing stage 16, which has a stage161 for each band. The stage 16 computes the energy of the magnitudes ofthe Fourier coefficients of the respective bands. A bit derivationcircuit 17 is connected to the energy computing stage 16. The bitderivation circuit 17 converts the energy levels of each band into bitsand is for this purpose provided with a first subtractor 171, a framedelay 172, a second subtractor 173 and a comparator 174 for each band.The resulting sub-fingerprints of all successive frames are stored in abuffer 18 as a fingerprint. The fingerprinting device also includes abit reliability determining circuit 19, which determines the reliabilityof the bits in the fingerprint. The fingerprint in the buffer 18 and thebit reliability information from the bit reliability determining circuit19 are sent from the device 10 to a computer 20 provided in the server.The database 21 connected to the computer 20 has a number of storedfingerprints all comprising sub-fmgerprints for a large number of audiosignals or songs. In FIG. 1 there is also shown a look-up table 22 and,which the computer 20 uses when searching for a matching fingerprint inthe database 21, which matching fingerprint corresponds to a fingerprintreceived from the device 10.

One difference between the fingerprints in client and server is that thedatabase includes fingerprints for whole audio signals, whereas a clientnormally only generates one or a few fingerprints for an audio signal.The functioning of the device shown in FIG. 1 and the generation offingerprints as well as how matching of fingerprints is being performedis described in more detail in the document “A Highly Robust AudioFingerprinting System”, by Jaap Haitsma and Ton Kalker, Ismir, October2002, which is herein incorporated by reference.

FIG. 2 shows a client device 24 connected to a server device 26 via acomputer network 28, like the Internet. The client device 24 thusgenerates a fingerprint generated in the above-described way and sendsthis together with bit-reliability information as a query to the server26 for audio signals in need of identification. The server 26 looks inthe database and returns information about the audio signal to theclient after searching in the database. The returned information isnormally metadata like name of song, artist etc. When doing thisidentification the server compares the sub-fingerprints in a fingerprintwith the sub-fingerprints of audio signals stored in the database andreturns a positive identification, when the Hamming distance between twofingerprints are found to be below a certain threshold.

In the device described above identification of a piece of audio can bemade quickly based on a fingerprint corresponding to approximately 3seconds and containing 256 sub-fingerprints. This can however lead tosome problems, which this invention will solve. Many audio signals orclips may start with silence, which can be a few seconds long. Manyaudio signals will therefore include information, which actuallyrepresents silence. This means that there can be several audio signalsall of which are also started with silence that can be found tocorrespond to an audio file for which a fingerprint is taken. There isthus a need for taking care of this silence. In case of video this wouldcorrespond to a number of black frames at the beginning.

A device for handling digital silence 30 according to the invention isshown in a block schematic in FIG. 3. The device 30 includes a controlunit 32 arranged to be connected to the buffer 18 of the fingerprintingdevice shown in FIG. 1 and a random number generating unit 34 connectedto the control unit 30.

The functioning of the units in FIG. 3 will now be described for use ina client device together with FIG. 4, which shows a flow chart of afirst embodiment of a method according to the invention. The clientdevice first generates a number of subfingerprints for an audio signalin a fingerprinting device, step 42, which sub-fingerprints are storedin the register 18. The control unit 32 of the device 30 fetches thesesub-fingerprints from the register 18 and investigates if any of thesesub-fingerprints have zero values, i.e., correspond to digital silencein case of the described fingerprinting algorithm, step 44. If not anyof them do, the sub-fmgerprints are kept unchanged in the register andthen the investigation is ended, step 50. If they do include zerovalues, the control unit 32 contacts the random number generating unit34, which generates random values, step 46. These random values are thensubmitted to the control unit 32, which replaces the zero valuedsub-fingerprints with these random values in the sub-fingerprintregister 18, whereupon the investigation is ended, step 50. When theclient device later sends a query including a fingerprint where zerovalued sub-fingerprints have been replaced by these random values to theserver, the probability of finding a match in the database is very low,which avoids the return of a wrong match of the audio signal. If theclient device has to make a positive identification it has to sendanother query later, when the audio signal is not silent, and then apositive identification can be made.

The device 30 can as an alternative be provided on the input side of theclient device, i.e. before sub-fingerprints are generated. In this casethe control unit 32 will be connected to a register where the actualaudio signal is temporarily stored before being subject tofingerprinting. A method according to an alternative embodiment of theinvention will now be described with reference being made to FIG. 5,showing a flow chart of a method according to this second embodiment.First the samples of the audio signal, which can consist of a number ofPCM samples, are analysed by the control unit, step 52, for determiningif there are any zero samples present or rather if there are samplesthat are beneath a certain lowest level, which would result in asub-fingerprint of zero, step 54. If there are, the random numbergenerator is made to generate random numbers, step 56. Thereafter thecontrol unit 32 replaces the zero valued PCM samples or rather thesamples under said threshold with the random values, step 58. Thereafterthe samples of the audio signal are submitted to the fingerprintingdevice for generation of sub-fingerprints in the known way, step 60.Since the zero level samples of the audio signal have already beenreplaced, the sub-fingerprints subsequently generated for theses sampleswill likewise be random in nature and therefore a match for silent partsof the audio signal in the database is less likely. In case there are nozero values samples, step 54, fingerprinting generation is performeddirectly, step 60.

There are some other possible variations to the above-described scheme.One variation of the alternative embodiment of the invention is to add asmall piece of random noise to all samples of the audio signal before afingerprint is generated, i.e. also to the samples not corresponding tosilence. It is furthermore possible to remove the digital silence fromeither the digital samples before fingerprinting is performed or toremove the sub-fingerprints, which correspond to digital silence insteadof replacing them with random numbers. When this is done it is howevernot guaranteed that the spacing between subsequent sub-fingerprints are11,8 ms apart. Then there is a risk that low-amplitude noise which canbe added to a radio broadcast audio signal instead of silence will be apart of the fingerprint sent to a database. If the database has thecorresponding silence removed, this will lead to a less than optimalmatch.

The unit in FIG. 3 can just as well be provided together with afingerprinting device in the server as in the client, either before thefingerprinting device or after, as was described above. This ensuresthat the database will not have any sub-fingerprints having a zero valuefor a fingerprint of a piece of audio, but these are replaced by randomwords. Digital silence can also be removed in the server in the same wayas was described in the paragraph above, by removing the digital silencesamples or the sub-fingerprints corresponding to digital silence.

The sub-fingerprints generated are of 32 bits and a sub-fingerprintscorresponding to silence is then the hexadecimal value 0x00000000. It isconvenient to use a standard linear congruential random number generatorfor generating 32 bit random words to use for replacing the zerosub-fingerprints. The random number generator is initialised with arandom number X₀. Subsequent random numbers are obtained according toequation (1) below.X _(N+1)=(1664525*X _(N +)1013904223)mod 2³²  (1)

There is however a problem with the use of this method in case both theclient and the server have fingerprints where this same type of randomnumber generator has been used. Since the only real random number is thefirst number and all subsequent random numbers are computed in a knownway from this first random number, there is a risk that both the deviceswill end up with the same random numbers for digital silence. This couldlead to a matching of the fingerprint in the database based on thesequence of “random” sub-fingerprints for silence. If the database hasabout 1 million songs this risk is at least 1/4000 or 0,025%. In factthe risk is even higher than this because of the risk of matchingbetween sub-fingerprints in a query and database provided in differentpositions in the fingerprint

One way to solve this problem is to have different random numbergenerating schemes for client and server. This would lead to differentimplementations of database and fingerprint query generation in serverand client. Another solution to this problem will be described inrelation to FIG. 6 below.

FIG. 6 shows a first variation of a random generating unit 34, whichincludes a standard linear congruential random number generator 36connected to a first input of a logical unit 40, which in this case is alogical Exclusive-OR unit 40. The logical unit 40 receives a valueV(t_(sys)) on a second input, which value is a 32-bit value that isdependent on the date and time of the generation of the fingerprint. Thevalue V(t_(sys)) is dependent on the system time of the computer wherethe random number generator is provided. This makes the subsequentrandom values not only dependent on the first random value but also onthe current system time and date.

The probability for these values to correspond to digital silence inboth the client and the server are therefore reduced significantly.

One variation of this latter unit is shown in FIG. 7. FIG. 7 shows aLinear Feedback Shift Register circuit 62 which is used for generationof random bits. The unit includes a number of tapped delay lines τ,64-72. The delays are connected in series and the last 72 is connectedto the output 94 of the random number generating unit 62. A multiplyingunit _(g1) 82, _(g2) 84, . . . _(g29) 78, _(g3o) 76 and _(g31) 74 isprovided between each delay unit. The multiplication factor can beeither 1 or 0. Each multiplying unit is connected to a correspondingadding unit 84-92, of which a last 92 is also connected directly to theoutput 94 and a first 84 is connected to the input of the first delayunit 64. In order to produce 32 m bit random numbers one needs 32 ofthese Linear Feedback Registers. Each of the 32 LFSR's is initialisedwith a different 32-bit number derived from the computer system time.Every LSFR generates 1 random bit. Since every LFSR is initialised witha 32 bit number that depends on the system time, the cycle of thisimplementation also depends on the system time.

The present invention is preferably provided with one or more processorswith associated program memory in which the program code for performingthe method according to the invention is stored. The program code canalso be provided in the form of a data carrier, like a CD Rom disk 96 asis shown in FIG. 8. The program code can also be downloaded to a devicefrom a server via a network, like the one shown in FIG. 2.

The present invention has several advantages. It avoids the wrongidentification of media signals in which digital silence is included ina reliable way. It is also easy to implement since it uses some of thefunctionality already provided in a computer. In a variation of theinvention it also guarantees that random numbers generated almostcertainly do not generate false identifications.

The present invention has been described in relation to computers in acomputer system. However, it is not limited to this, but can beimplemented in other types of environments for instance like in a mobilephone communicating with a server via a cellular network. A mobile phonecan also be made to communicate with a computer that is a client deviceconnecting to a server including the above-mentioned database. Theinvention is furthermore not limited to the described fingerprintingscheme, but can be implemented in any fingerprinting scheme that has tobe capable to handle digital silence. The invention was described inrelation to PCM samples. It should be realised that it is alsoapplicable when different types of compression and coding are used, likeMP3-coding as well as for other types of media signals like videoTherefore the present invention is only to be limited by the followingclaims.

In summary, the invention relates to a method, a device, a client-serversystem as well as a computer program product and computer programelement for handling digital silence when fingerprinting digital mediasignals. A fingerprint comprising a number of sub-fingerprints for atleast a part of the digital media signal is generated, (step 42), andthe influence of at least one piece of the media signal on thefingerprint is removed or changed, (step 48), which piece corresponds todigital silence. The invention in a reliable way avoids a wrongidentification of media signals, such as audio signals, where digitalsilence is included. The invention is also easy to implement by onlyrequiring some of the functionalities already provided in a computer.

1. Method of handling digital silence when fingerprinting a digitalmedia signal comprising the steps of: generating a fingerprintcomprising a number of sub-fingerprints for at least a part of thedigital media signal, (step 42; 60) and removing or changing theinfluence of at least one piece of the media signal on the fingerprint,(step 48; 58), which piece corresponds to digital silence.
 2. Methodaccording to claim 1, wherein the step of removing or changing theinfluence comprises removing the piece of the digital media signalbefore generating a fingerprint.
 3. Method according to claim 1, whereinthe step of removing or changing the influence comprises removing asub-fingerprints from the fingerprint having a value corresponding todigital silence of said piece of the media signal.
 4. Method accordingto claim 1, wherein the step of removing or changing the influencecomprises providing a random value for said piece of the media signalcorresponding to digital silence.
 5. Method according to claim 4,wherein the step of providing a random value comprises adding a randomvalue to each piece of the media signal.
 6. Method according to claim 4,wherein the step of providing a random value comprises substituting asub-fingerprint having a value corresponding to digital silence in themedia signal with a random value, (step 48).
 7. Method according toclaim 4, wherein the step of providing a random value comprisessubstituting a piece of the media signal corresponding to digitalsilence with a piece corresponding to random noise before startinggeneration of a fingerprint, (step 58).
 8. Method according to claim 4,wherein the method is performed in a first device (24) and the wayrandom values are generated in the first device differs from the wayrandom values are generated in a second device (26), with which thefirst device is communicating in order to identify a media signal. 9.Method according to claim 4, wherein the step of providing a randomvalue comprises generating a random value using a random numbergenerator.
 10. Method according to claim 9, further including the stepof processing the random value with additional information that isdependent on time and date information related to the generation of thefingerprint.
 11. Method according to claim 10, wherein the step ofprocessing comprises performing and exclusive-or operation on the randomvalue and the additional information.
 12. Method according to claim 10,wherein the processing is provided through a number of linear feedbackshift registers.
 13. Method according to claim 1 further including thestep of transferring the fingerprint to a server for matching against afingerprint database.
 14. Method according to claim 1, further includingthe step of storing the fingerprint in a server fingerprint database tobe used for matching against fingerprints received from client devices.15. Device (24; 26) for handling digital silence when fingerprintingdigital media signals and comprising: a fingerprint generating unit (10)arranged to generate a fingerprint comprising a number ofsub-fingerprints for at least parts of a digital media signal, and adigital silence removal unit (30) arranged to remove or change theinfluence of at least one piece of the media signal on the fingerprint,which piece corresponds to digital silence.
 16. Device according toclaim 15, wherein the silence removal unit (30) includes a random numbergenerating unit (34; 62) for generating a random value for the piece ofthe media signal corresponding to digital silence.
 17. Device accordingto claim 16, wherein the silence removal unit (30) is arranged tosubstitute a sub-fingerprint generated by the fingerprint generatingunit having a value corresponding to digital silence in the media signalwith a random value.
 18. Device according to claim 16, wherein thesilence removal unit (30) is arranged to substitute the piece of themedia signal corresponding to digital silence with a piece correspondingto random noise before submission to the fingerprint generating unit forgenerating a fingerprint.
 19. Device according to claim 16, furtherincluding a logical function unit (40) arranged to process the randomvalue with additional information that is dependent on time and dateinformation related to the generation of the fingerprint.
 20. Deviceaccording to claim 19, wherein the logical function unit (40) is anexclusive-or unit.
 21. Device according to claim 16, wherein the randomnumber generating unit (62) is provided as a number of linear feedbackshift registers.
 22. Device according to claim 15, wherein the device isa client device (24) arranged to generate fingerprint queries to aserver device (26) including a database (21) of fingerprints for anumber of different media signals.
 23. Device according to claim 15,wherein the device is provided in a server (26) including a database(21) of fingerprints for a number of different media signals used forcommunication with at least one client device (20).
 24. System ofdevices for handling digital silence when fingerprinting digital mediasignals and comprising: a server (26) device having a database (21) offingerprints related to media signals stored as media files, and aclient device (24) for generating fingerprint queries to the serverdevice, wherein at least one of client and server device comprises: afingerprint generating unit (10) arranged to generate a number of subfingerprints for at least parts of a digital media signal, and a silenceremoval unit (30) arranged to remove or change the influence of at leastone piece of the media signal on the fingerprinting, which piececorresponds to digital silence.
 25. Computer program product forhandling digital silence when fingerprinting digital media signals, tobe used on a computer, comprising a computer readable medium (96) havingthereon computer program code means, to make the computer execute, whensaid program is loaded in the computer: generate a number ofsub-fingerprint for at least parts of a digital media signal, and removeor change the influence of at least one piece of the media signal on thefingerprint, which piece corresponds to digital silence.
 26. Computerprogram element for handling digital silence when fingerprinting digitalmedia signals, to be used on a computer, said computer program elementcomprising computer program code means, to make the computer execute,when said program is loaded in the computer: generate a number ofsub-fingerprints for at least parts of a digital media signal, andremove or change the influence of at least one piece of the media signalon the fingerprint, which piece corresponds to digital silence.